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ABSTRACT 

Translator  Writing  Tools  (TWT)  is  a  complex  system,  composed  of 
two  languages,  EOL-^  and  MOL,  and  of  certain  theoretical  concepts,  which 
tie  these  two  languages  together.  TWT  is  intended  to  be  applied  to 
educational  purposes  and  to  actual  translator  writing.  The  EOL-4  language 
is  designed  to  deal  with  recognition  problems  such  as  lexical  and  syntatic 
analysis,  as  the  manipulation  of  symbol  tables.   The  results  generated  by 
EOL-^  programs  have  tree -structured  format  and  are  subsequently  interpreted 
by  MOL  programs  which  generate  code  for  any  target  machine.   The  TWT  system 
can  be  easily  transferred  from  one  computer  to  another,  together  with  all 
the  programs  written  in  it. 


1.   INTRODUCTION 

In  the  translator  writing  domain  many  different  techniques  are 
presently  used  for  lexical  and  syntactical  analysis,  symbol  table  manipula- 
tion and  code  generation.  However,  the  coding  of  these  techniques  for 
concrete  cases  is  usually  time-consuming  and  requires  a  new  effort  for 
every  new  object  machine.   This  situation  is  reminiscent  of  the  general 
situation  in  numerical  computation  in  the  early  fifties,  when  the  imple- 
mentation of  numerical  algorithms  required  time-consuming  coding  for  every 
particular  computer.  Only  with  the  advent  of  universal  higher-level 
programming  languages  (beginning  with  FORTRAN  in  195*0  "was  this  problem 
satisfactorily  settled.  None  of  these  languages  reflected  any  specific 
algorithm;  they  were  sufficient,  however,  to  express  most  of  the  known 
programming  techniques  easily. 

Similar  tools  are  needed  today  in  the  translator  writing  domain. 
We  need  universal  higher-level  languages,  in  which  translator  writing 
techniques  can  be  programmed  without  excessive  effort. 

The  best  solution  to  the  above  problem  would  be  to  design  a 
single  language  for  expressing  algorithms  for  all  the  translation  phases. 
However,  such  an  attempt  appears  very  difficult  because  the  required 
language  should  serve  simultaneously  two  different  purposes,  namely  source 
language  recognition  on  one  hand  and  object  code  generation  on  the  other. 
This  task  is,  however,  substantially  facilitated  if  we  accept,  as  the 
solution,  two  languages  which  are  different  but  well  matched  to  each  other: 
the  first  one  for  recognition  problems  and  the  second  one  for  code  generation. 
This  paper  describes  a  system  based  on  this  idea,  with  two  languages  E0L-4 
and  MOL,  serving  the  two  purposes. 
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The  language  EOL-4  (Expression  Oriented  Language  k)   is  designed 
to  deal  with  recognition  problems  which  occur  during  the  initial  phases 
of  translation,  such  as  lexical  and  syntactic  analysis,  and  symbol  table- 
manipulation.   In  this  domain  most  of  the  recently  used  techniques  can 
be  easily  coded  in  this  language.   The  characteristic  feature  of  EOL-^-  is 
the  multiplicity  of  data  structures  it  can  deal  with.   Simple  integers, 
lists  and  records  can  be  processed  by  EOL-^-  in  main  storage;  large  amounts 
of  information  may  be  kept  and  shuffled  in  secondary  storage.   The  result  of 
EOL-^  processing  may  be  presented  as  a  sequence  of  lists  (a  file)  with  a 
keyword  attached  to  each  list.   The  keywords  can  serve  as  names  and  the 
lists  as  arguments  of  the  macro-calls  interpreted  subsequently  by  MOL. 

The  language  MOL  (Macro  Oriented  Language)  is  a  powerful  macro- 
processor,  equipped  with  special  facilities  to  process  the  macro-calls 
generated  by  EOL-^,  and  to  generate  code  for  any  object  computers.   Lists 
of  any  depth  can  be  processed  by  MOL. 

It  jhould  be  noticed  here,  that  the  idea  of  using  two  languages 
in  a  translator  writing  system,  the  first  for  recognition  and  the  second  for 
code  generation,  has  already  been  used  in  a  few  systems  (see  [8]  and  [9])« 
What  is  new  in  our  case,  however,  is  the  generality  of  EOL-4  and  MOL  the 
possibility  of  essential  optimization  of  the  translation  time  by  means  of 
hand-written  patches  included  in  the  EOL-4  programs  and  the  ease  with  which 
EOL-^  and  MOL  can  be  transferred  from  one  computer  to  another  one.   By 
imposing  no  Imitations  on  the  translation  methods  used  this  system  makes 
it  possible  to  obtain  as  good  an  object  code  as  needed. 

Two  versions  of  EOL  have  been  already  implemented.   The  first  one, 
called  EOL-2,  was  implemented  in  1967  on  the  Polish  computer  ZAM  hi   [1]. 
An  improved  version,  called  EOL-3,  "was  implemented  at  the  University  of 


Illinois  in  I968  on  the  IBM  7091*-  and  360  computers  [2,3].   Both  EOL-2  and 
EOL-3  were  used  for  educational  purposes,  for  instance,  in  writing  students 
term  projects.   EOL-2  was  also  used  for  implementing  actual  languages  such 
as  subsets  of  COBOL  and  ALGOL  60.   This  application  showed  that  the  use  of 
EOL-2  can  lower  the  time  and  effort  to  write  translators  considerably.   In 
these  translators,  EOL-2  co-worked  with  a  rather  primitive  macro-processor, 
and  it  was  clear  that  a  more  powerful  macro-processor  would  improve  the 
translation  efficiency  significantly. 

At  present  EOL-^-  and  MOL  are  being  implemented  in  Warsaw  on  the 
ZAM  ^1  computer,  and  once  this  is  done  the  transfer  to  other  computers 
should  not  create  serious  difficulties.  EOL-^  takes  into  account  all  the 
experiences  gained  with  the  previous  version  of  EOL.   It  has  been  influenced 
also  by  NUCLEOL  [4]  in  its  choice  of  data  structure  (trees)  and  operations 
thereon. 


2.   A  SCHEME  FOR  USING  TWT 

2.1.   Defining  the  Semantics  of  a  Language 

In  order  to  present  TWT  let  us  describe  its  typical  application 
to  implement  a  precedural  language  of  the  type  of  ALGOL  60  or  FORTRAN  V. 

We  usually  start  this  task  "by  establishing  the  formal  definition 
of  the  language  concerned  by  defining  an  abstract  sequential  machine  M. 
for  the  language  under  consideration.   This  machine  will  be  used  later  on 
as  a  model  simulated  by  the  actual  machine.   The  operation  of  M.  can  be 
described  using  abstract  objects,  developed  by  the  IBM  Laboratory  Vienna  [6]. 
The  application  of  abstract  objects  to  describe  a  sequential  computer  is 
described  as  example  h   in  [7].   The  machine  language  A  of  the  machine  M. 
is  the  object  language  to  which  the  programs  written  in  the  source  language  S 
are  translated.   The  translator  t  .,  therefore,  is  a  function  which  for 
every  program  seS  assigns  a  program  aeA,  that  is 

tsa:  S  ->  A   or    s  =  TgA(a) 
Graphically,  we  can  represent  this  relation  by 


Source   (     q  \ ^SA (     .  \  Object 


Language  V    J    (Translator)  V    J     Language 


The  translator  t   can  be  described  formally  using  as  a  basis  the 
syntactical  tree  of  the  source  language  S.   By  transferring  according 
to  established  rules  the  syntax-tree  of  a  source  program  seS  we  eventually 
reach  the  syntax  tree  of  an  object  program  aeA.  We  always  assume  that  the 
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grammar  of  the  source  language  S  is  unambiguous  and  that  the  syntactical 
tree  of  each  program  seS  can  be  found  by  some  parsing  method. 

The  formal  definition  of  the  abstract  machine  A  and  of  the 
translator  x  .  enables  us  to  define  the  semantics  of  the  source  language  S. 

oA 

To  every  program  seS  we  assign  a  corresponding  object  program  a  =  x  (s). 
This  program,  together  with  the  input  data  deD,  we  consider  as  the  initial 
state  of  the  machine  M..  The  computational  process,  or  simply  the  compu- 
tation beginning  with  this  state,  we  define  as  the  meaning  of  the  object 
program  a.   Denoting  the  set  of  all  possible  computations  of  the  machine  M 
by  C  we  can  define  the  semantics  Sem  of  A  as  the  function 

Sem  :  AxD-C    or    C  =  Sem  (a,d) 


which  to  every  program  a  e  A  with  input  data  d  e  D  assigns  its  meaning  c  e  C. 
Having  in  mind  the  function  x   :S  ->  A  we  can  now  define  the  semantics  of  the 


language  S  with  input  data  D  as 


Sem  •   SxD-»C    or    C  =  SemQ(s,d) 

b  b 


From  the  above  relations  we  obtain 


C  =  Sems(s,d)  =  Sems(TgA(a) ,d) 


The  computation  c  e  C  assigned  to  a  program  s  e  S  and  data  d  e  D 
(that  is  c  =  Sem  (s,d))  we  call  also  the  interpretation  of  the  program  s 

b 

with  the  data  d  by  the  machine  M. . 

The  scheme  described  above  could  serve  as  the  basis  for  writing  a 
translator.   In  general,  however,  it  is  convenient  to  insert  between  the 


source  and  the  object  language  an  additional  intermediate  language  I.   Now 
we  have 


TSI:  S  -  T    >  TIA:  1  -   A 


and 


TSA:  S  *  A 


where     t   =  t   o  t   (the  circle  denotes  the  composition  of  functions) 

OrV        _Lr\       OX 


This  result  can  be  represented  graphically  as 


"SA 


Source 
Language 


Object 
Language 


Intermediary 
Language 


For  our  purposes  the  language  I  should  be  designed  in  such  a  way 
that  the  translator  t   is  easy  to  code  in  EOL-^,  and  t   in  MOL.   This 

o  J.  _LA. 

means  also  that  all  the  sentences  of  the  language  I  should  have  the  form 
of  the  MOL  macro-calls. 


2.2.   Example 

Let  us  assume  that  a  sentence  in  a  source  language  S  is  the 
following 


Y  =  A*(B*C  +  DX) 


(1) 


This  sentence  might  be  translated  t   to  the  following  MOL 


macro-call. 


*  ASSIGN  Y  =  (A  *  ((B  *  C)  +  DX)) 


which  may  be  represented  graphically  as  the  following  nonlabelled  syntax 
tree  with  the  name  (key)  ASSIGN: 


ASSIGN 


The  call  of  the  macro-definition  ASSIGN  might  lead  to  the 
following  sequence  of  statements  (instructions)  of  the  machine  M. . 


PUSH 

A 

PUSH 

B 

PUSH 

C 

MUL 

PUSH 

d: 

ADD 

MTTT 

flU-Li 

STORE 

Y 

If  the  machine  M  has  a  stack  for  computing  arithmetic  expressi- 
this  sequence  causes  the  computation  of  the  expression 

A*(B*C  +  DX) 

and  stores  the  result  at  the  address  Y.   This  sequence  also  defines  the 
fragment  of  computation  corresponding  to  the  formula  (l). 

2.3.   Translating  the  Concrete  Machine  Language 

Having  to  do  with  real  machines  M.,Mp,...  instead  of  an  abstract 
machine  M  ,   the  corresponding  MOL  macro-definitions  should  be  changed  to 
generate  instructions  for  the  real  machines  instead  of  the  abstract  machine. 

We  come,  therefore,  to  the  scheme 


Source 
Language 


Written  in  EOL:  1  Written  in  MOL: 


Abstract  Machine  Language 


R  Machine  Language 


R  Machine  Language 


where  T-nJk  =  l,2,...,n)  translates  from  the  intermediary  language  I  to 
the  machine  language  R  . 

In  the  above  scheme  the  t   translator  (in  the  form  of  a  sequence 
of  macro-definitions)  forms  a  model  for  the  translators  t  , 1     , ...      .     In 


fact,  it  is  not  absolutely  necessary  to  determine  x  ,  but  in  practice 
it  is  very  useful  to  do  so,  not  only  for  creating  a  model  for  subsequent 
translation  for  actual  computers,  but  also  for  the  source  language 
definition. 


2.^.   Other  Translation  Schemes 

In  some  cases  the  above  scheme  can  be  simplified  by  assuming  that 
the  intermediary  language  I  and  the  abstract  machine  language  A  are  identical, 
Then  the  machine  M.  is  simulated  on  machines  M-.,Mp,...  with  languages 
R_,R_,...  •  We  come,  therefore,  to  the  scheme: 

Written  in  EOL:    Written  in  MOL: 


R   J     R  Machine  Language 


Rp  Machine  Language 


where  x  ,  means  the  translator  from  the  abstract  machine  language  A  to  the 
interval  form  of  the  object  machine  language  R,  . 

This  scheme  is  advantageous  when  the  object  program  is  performed 
by  interpretation  of  an  abstract  machine  language  A.   In  other  cases  it  may 
lead  to  nonefficient  object  code. 

A  similar  scheme  is  valid  for  translation  of  problem-oriented 
languages.   Let  us  assume,  that  every  program  written  in  a  problem-oriented 
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language  S  can  be  translated  to  the  program  written  in  a  higher-level 
procedural  language  M,  e.g.,  in  PL/l.   Then  we  have  the  following  scheme 


Problem-oriented 
Language 


Written  in  EOL: 


_£H_ 


Higher -level 
Language 


The  translator  TotT  can  be  completely  written  in  EOL. 

on 

The  above  scheme  may  be  extended  by  introducing  an  intermediate 
language  I,  built  of  sentences,  which  have  the  form  of  MOL  macro-calls. 
Depending  on  the  macro-definitions  used  we  can  obtain  translators 
t  f^on?  ' • '    for  the  higher-level  languages  H_,HL, . . .,  according  to  the 

following  scheme: 


Written  in  EOL:     Written  in  MOL: 


\  Higher -Level  Languages 


As  the  languages  M  ,M ', . . .  one  choose  FORTRAN  IV,  ALGOL  60, 


PL/1,  etc. 
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.  .   Using  EOL  and  MOL  for  the  Translation  Schemes 

Using  EOL  and  MOL  in  the  preceding  translation  schemes  provides 
us  with  many  advantages.   EOL  and  MOL  reduce  coding  effort  considerably  in 
comparison  with  machine-language  programming.  Moreover,  the  translator 
just  written  in  EOL-^-  is  independent  of  any  concrete  computer  and  needs  to 
be  written  only  once.   In  the  case  when  an  abstract  machine  serves  as  a 
model  of  concrete  machines  (see  Section  2.3)  >   "the  macro-definitions  which 
define  the  translator  x   (from  intermediary  language  to  the  abstract 
machine  one)  serve  as  models  for  the  macro-definitions  which  define  t 

J~rL 

(for  concrete  machines). 

Implementing  a  new  language  requires  an  initial  effort  for  its 
formal  definition  and  for  writing  the  EOL  and  MOL  part  of  the  translator. 
This  effort,  however,  has  to  be  performed  only  once.  Having  done  this,  the 
additional  effort  needed  to  write  the  translator  for  a  concrete  machine  is 
comparatively  small.   This  is,  of  course,  valid  with  the  assumption  that 
EOL-4  and  MOL  are  already  implemented  on  the  object  computer.   However,  as 
we  will  see  in  Section  5,  using  EOL-^  and  MOL  to  help  implement  themselves 
and  using  same  additional  tools,  this  task  also  may  be  considerably 
simplified. 


3-   EOL-^  LANGUAGE 

EOL-4  is  a  language  for  symbol  manipulation.   In  particular,  most 
of  the  techniques  for  lexical  and  syntax  analysis  may  be  coded  in  E  -  . 

The  EOL-^  language  can  be  described  as  the  programming  language 
of  the  hypothetical  EOL-^  computer  shown  in  Figure  1.   The  EOL-4  computer  is 
intended  to  be  simulated  on  a  conventional  computer  and  is  therefore  a  pure 
software  system.   Specifically,  most  fixed  length  word  binary  computers  are 
well  suited  for  simulation  of  the  EOL-4  computer. 

The  particular  parts  of  EOL-^  computer  have  the  following 
objectives. 


Input 


Counters 


IK 


±. 


Stacks 


JL 


Shelves 


Output 
> 


Files     >  To  MOL 


Figure  1.   Block  Diagram  of  EOL-^4-  Computer 
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.  .   ^  unters 

Counters  contain  integers  on  which  four  arithmetical  operations 
can  be  performed.  Counters  serve  also  to  count  and  store  such  integers  as 
the  current  level  of  block  nesting,  or  the  current  number  of  a  sentence 
being  analyzed,  etc. 

.  .   Input  and  Output 

In  EOL-4  input  and  output  data  are  represented  in  the  form  of 
strings  of  arbitrary  characters.   In  EOL-^  programs,  the  input  is  identified 
by  the  variable  INPUT,  the  output  by  the  variable  OUTPUT.   To  illustrate 
values  of  strings  denoted  by  these  variables  we  write,  for  example, 

INPUT:  ABCD 
OUTPUT:  RESULT  =  1; 

The  above  notation  is  intended  exclusively  for  illustrative 
purposes  and  is  neither  a  part  of  a  program  nor  does  it  serve  to  determine 
the  data.  We  shall  make  use  of  a  similar  auxiliary  notation  to  illustrate 
values  of  lists  and  files. 

The  input  data  are  read  character  by  character  so  that  each  time 
the  first  character  of  the  indicated  string  is  read  it  is  simultaneously 
deleted  from  the  string.   The  output  data  are  written  by  adding  new 
characters  at  the  end  of  the  indicated  output  string. 

Assembling  the  input  characters  into  symbols  or  detecting  a 
specified  character  problem  is  done  by  a  program.  Due  to  this,  input  and 
output  data  may  have  any  form,  the  most  convenient  to  the  given  program. 


Ik 


3.3.   Stacks 

The  input  data  are  used  to  build  expressions  which  may  be  stored 
in  stacks  or  in  files. 

In  order  to  illustrate  the  content  of  stacks  we  write,  for  example, 

EX:  ~XA°3^PH 


WX1:   Y  =  Al  *{  B  *{  ALFA  +  3)) 

where  the  identifiers  EX  and  WX1  are  names  of  stacks.   The  content  of  stacks 
are  linear  lists  composed  of  constituents,  each  one  preceded  by  one  of  the 

Q  /\ 

marks   or   or  ,  or  enclosed  in  curled  bracksts  {  and  ) . 

The  marks  determine  the  type  of  constituent  inside  the  computer. 

The  mark   indicates  a  character  string  of  arbitrary  length. 

The  mark   indicates  an  integer. 

The  mark   indicates  an  address  of  a  record  in  a  file  or  a 
block  in  a  shelf. 

The  brackets  {  and  }  assemble  several  constituents  into  one 
constituent.  As  the  brackets  may  be  nested,  any 
list  may  be  expressed  as  a  single  constituent. 

In  EOL-^  there  are  many  instructions  to  facilitate  flexible 

processing  of  stack  contents,  for  example: 

(a)  Moving  a  determined  number  of  constituents  from  the  top  of 
one  stack  to  the  top  or  to  the  end  of  another  stack,  the  original  order  of 
constituents  being  kept  or  reversed. 

(b)  Concatenating  several  strings  starting  at  the  top  of  a  stack 
into  one  string. 

(c)  Splitting  one  string  at  the  top  of  a  stack  into  separate 
characters,  and  putting  them  at  the  beginning  of  a  stack. 
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(d)  Testing  whether  the  constituent  at  the  top  of  a  stack  equals 
a  given  string  or  integer  or  whether  it  belongs  to  a  specified  class  of 
constituents,  for  instance,  it  is  a  letter. 

(e)  Enclosing  the  content  of  a  stack  into  brackets,  or  removing 
the  brackets  from  the  constituent  of  the  block  of  a  stack. 

Inside  a  computer  stacks  are  held  in  store  in  the  form  of  lists, 
composed  of  pairs  of  computer  words.   One  word  of  such  a  pair  comprises  a 
part  of  the  constituent;  the  other  word  comprises  the  address  which  points 
to  the  next  pair  of  words.   A  separate  "free  storage  list"  is  the  list  from 
which  new  pairs  are  automatically  taken  and  the  disused  pair  are  automatically 
added. 

In  EOL-4  the  majority  of  operations  for  handling  a  small  amount 
of  information  are  performed  on  stack  contents. 

3.4.   Files 

Files  are  used  to  keep  expressions  in  compact  form  and  file  records. 
Each  record  is  stored  in  consecutive  storage  locations  of  core  memory.   The 
top  constituent  of  a  record  constitutes  its  key  which  enables  one  to  look 
for  a  record  with  this  key. 

In  order  to  illustrate  the  value  of  a  file,  we  write,  for  example, 

NEW  FILE:      rrrcrrtrcxr 

jre  r  represents  a  record,  a  represents  end  of  a  section  mark  and  the 
symbol  t  denotes  a  scanner  which  serves  to  distinguish  one  particular  place 
in  a  file.   Reading  from  a  file  always  involves  the  record  which  follows 
directly  after  the  scanner.  Writing  into  a  file  is  done  by  inserting  a 
record  directly  before  or  after  the  scanner. 
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In  EOL-^t-  there  are  many  instructions  which  permit  one  to  pr- 
files,  for  example: 

(a)  Writing  the  content  of  a  stack  as  a  record  into  a  file. 

(b)  Reading  a  record  from  a  file  to  generate  stack  content. 

(c)  Shifting  the  scanner  forward  until  it  comes  across  an  end 
of  section  symbol  or  a  record  with  the  given  key. 

(d)  Storing  the  address  of  record,  placed  directly  after  the 
scanner,  as  the  top  constituent  of  an  indicated  stack. 

(e)  Placing  the  scanner  in  front  of  the  record  whose  address 
is  the  top  constituent  of  an  indicated  stack. 

The  two  last  operations  permit  one  to  manipulate  the  addresses 
of  records  in  stacks.   This  enables  one  to  determine  freely  the  file 
structure,  for  instance,  in  a  way  which  allows  an  effective  searching  of 
information  written  in  this  file. 

The  files  are  designed  to  store  medium  or  large  amounts  of 
information;  for  example,  the  symbol  tables  of  a  compiler. 

3- 5-   Shelves 

Shelves  are  intended  to  reside  on  secondary  storage,  such  as 
discs,  drums  or  magnetic  tapes.   Each  shelf  contains  a  sequence  of  variable 
length  blocks;  each  block  is  formed  from  a  file  content. 

To  illustrate  the  content  of  a  shelf  we  write,  for  example 

SHX:      bbbcibbtbab 

where  SHX  is  the  shelf  name,  b  is  a  variable  length  block,  cr  is  the  end- 
of- sect ion  symbol  and  t  is  the  scanner  which  plays  the  similar  role  as 
the  file  scanner. 
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In  EOL-Jj-  there  are  several  instructions  which  permit  one  to 
process  shelves,  for  example 

(a)  Writing  a  file  content  as  a  new  block  into  a  shelf  at  the 
place  indicated  by  its  scanner.  All  blocks  which  previously  followed 
the  scanner  are  no  longer  accessible  by  a  program. 

(b)  Reading  a  block  content  and  putting  it  as  a  new  file  content. 

(c)  Instructions  which  preserve  and  restore  the  scanner  address. 
Shelves  are  designed  to  store  large  amounts  of  information,  for 

instance,  to  store  the  output  of  consecutive  passes  in  a  multi-pass 
translator . 

3.6.   Declarations 

There  are  four  kinds  of  declarations  in  EOL-4. 

The  first  kind  of  declarations  serve  to  introduce  and  to  name 
counters,  stacks,  files  or  shelves,  for  example: 

CCUNTER  LA,  LB,  LC(5) 

STACK  SZX,  Q(10) 

FILE  SYMB-TABLE 

SHELF  SHI,  SH2 

We  can  declare  one -dimensional  arrays  of  counters,  stacks,  files 
or  shelves. 

The  second  kind  of  declarations  serves  to  introduce  and  to  name 
;ches,  for  example 

SW:   SWITCH  LA,  LB,  LC 

where  LA,  LB  and  LC  are  names  of  labels.   Switches  are  used  with  conjunction 
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of  GOTO  instruction,  which  determine  the  index  of  a  label  in  the  label-list 
appearing  in  the  switch-declaration.   If,  for  example,  the  by  constituent 
of  the  stack  XI  is  the  number  3  then  the  instruction 

GOTO  SW  BY  XI 


is  equivalent  to  the  instruction 


GOTO  LC 

The  third  kind  of  declarations  serves  to  introduce  and  to  name 
tables,  for  example: 

TA:   TABLE   'BEGIN',  'END',  'FOR' 
TT:   TABLE   (0,0,3), 

(1,2,3), 

(1,1,2) 

The  table  TA  contains  three  strings.   By  a  suitable  instruction 
SEARCH,  for  example 

SEARCH  INDEX  SAX  IN  TA 

the  index  of  the  top  constituent  in  the  stack  SAX  in  the  table  is  found 
and  pushes  onto  the  stack  SAX.  For  instance,  if  the  top  constituent  in 
the  stack  SAX  was  'END'  then  the  entry  found  is  equal  to  2. 

Having  read  a  string  to  the  top  of  the  stack,  we  can  find  its 
index  in  a  table  and  then  jump  by  a  switch  using  this  index.   In  this  way 
we  can  specify  an  action  corresponding  to  the  string  just  read. 

Reading  an  integer  from  the  table  TT  with  the  index  (2,3)  may 
be  executed  as  follows.  We  put  the  list  {2  31  on  the  top  of  a  stack  so 
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that,  for  example: 


ST  =  [°2°y 


and  then  we  execute  the  instruction: 

FETCH  TO  ST  FROM  TT 

After  this  instruction  is  executed  the  top  values  of  the  stack  ST  is  the 
following 

st  =  0M°2°3}.  .  • 

Mult i- dimensional  tables  are  very  useful  in  many  syntax -analyzing 
techniques,  for  instance  in  passing  an  operator-precedence  language. 

The  fourth  kind  of  declarations  introduces  procedures.   Procedures 
may  be  needed  so  that  one  procedure  may  be  an  internal  procedure  of  another 
one.   £0L-^  permits  one  to  compile  external  procedures  which  form  one  program 
independently.  An  example  of  a  procedure  is  presented  in  Section  3-9- 

3. 7-   Instructions 

Instructions  written  in  the  EOL-^  language  cause  the  execution 
of  some  simple  operations,  such  as  changing  a  list  of  values  or  interrupting 
a  normal  sequence  of  program  execution.   Each  instruction  consists  of  a 
key  word  followed  by  not  more  than  three  instruction  arguments,  for  example 

MOVE  SAX  =  REVERSED  XI  UNTIL  '7' 
SCAN  F2  TIMES  3 
COMPUTE  X  =  X  +  1 

More  examples  of  instructions  can  be  found  in  the  following  program. 
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3-8.   Programs 

Programs  -written  in  the  EOL-^  language  consist  of 
external  declarations 
external  calls 
An  external  declaration  may  be  a  procedure,  a  variable,  a  switch  as  a 
table  declaration.  The  scope  of  use  for  a  declaration  is  the  entire 
program . 

The  body  of  external  procedures  consists  of  declarations  and 
instructions.   Declarations  inside  of  procedures  have  a  local  scope.   The 
names  of  variables,  switches  and  tables  have  to  be  declared  prior  to  their 
usage. 

3 >9'      Example  of  EOL-^  Program 

Let  us  consider  a  program  which  reads  and  deletes  the  next 
element  of  an  arithmetic  expression,  and  then  add  this  element  as  a 
separate  constituent  at  the  end  of  the  stack  FORM- LIKE.   The  formula  is 
written  on  the  input  in  the  form  of  a  string  of  characters,  and  either 
an  arithmetic  operator  or  an  arbitrary  string  composed  of  letters  and 
digits  constitutes  an  element  of  the  expression.   Blanks  preceding  an 
arbitrary  element  should  be  removed.   For  example,  let 

INPUT:   A  +  3  +  BETA 
F-LINE:   ~X1~  = 

VJhen  the  procedure  to  be  described  has  been  executed  four  times,  one 
obtains 

INPUT :   BETA 
F-LINE:   ~X1~  =  ~A~  +  "   + 
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Let  us  denote  this  procedure  by  the  identifier  READ-ELEM.   Then 
the  whole  program  has  the  following  form: 

(1)  STACK  F-LINE; 

(2)  READ-ELEM:      PROCEDURE    (X) ; 

(3)  STACK  X; 

(k)  CLEAR  INPUT  UNTIL  NOT  BLANK; 

(5)  IF  INPUT  =  LETTER  OR  DIGIT  THEN 

(6)  MOVE  X  =  INPUT  UNTIL  BLANK  OR  REST; 

(7)  ELSE  MOVE  X  =  INPUT  TIMES  1; 

(8)  RETURN; 

(9)  END; 


meaning : 


(10)  CALL  READ-ELEM  (F-LINE ) 


The  particular  lines  of  the  above  program  have  the  following 


(1)  External  declaration  of  the  stack  F-LINE. 

(2)  Heading  of  the  procedure  READ-ELEM  with  the 
formal  parameter  X. 

(3)  Local  declaration  of  X  as  a  stack. 
(k)     Clearing  all  initial  blanks. 

(5)  Testing  if  the  current  input  character  is  alphanumeric. 

(6)  If  yes,  then  read  into  the  stack  X  a  character  sequence 
up  to  the  nearest  character  BLANK  or  character  of  the 
class  REST.   (All  the  characters  are  divided  into  four 
classes:   DIGIT,  LETTER,  BLANK  and  REST.) 

(7)  If  not  then  read  into  X  a  single  (TIMES  l)  character  from 
the  INPUT. 

(8)  Return  to  the  calling  program. 


22 


(9)   End  of  the  procedure. 
(10)   Call  of  the  procedure  with  the  actual  parameters  F-LINE. 

EOL-4  Language  Extension 

EOL-^f  is  a  relatively  concise  language,  implemented  with  a  basic 
set  of  instructions  for  symbol  manipulation.  More  complex  operations  can 
be  defined  as  procedures.   However,  in  many  applications  such  procedures 
can  be  fairly  complex  and  their  execution  time  rather  long.   In  order  to 
make  them  more  efficient,  E0L-4-  procedures  can  also  be  written  in  machine 
language.  As  a  rule  they  are  based  on  the  data  structure  accepted  in 
EOL-4,  and  use  the  Interpreter  subroutines.   Thus,  these  procedures  may  be 
created  as  EOL-4  language  extensions  defining  specialized  operations, 
adapted  to  a  given  problem. 

The  EOL-^-  procedures  written  in  machine  language  are  often  first 
written  and  debugged  in  EOL-^  language  and  then  translated  into  machine 
language  by  a  programmer  who  is  familiar  with  the  Interpreter  and  the  EOL-4 
data  structure.   This  system  permits  one  to  obtain  a  full  program  documenta- 
tion in  EOL-^  language  and  possibly  to  perform  it  in  another  computer, 
equipped  with  the  EOL-^  Interpreter. 


h.      MOL  LANGUAGE 

MOL  is  a  macro-language,  which  in  particular  can  serve  for  code 
generation  in  the  way  discussed  in  Section  2.   The  MOL  notation  is  modelled 
after  PL  I;  in  fact,  MOL  was  based  on  a  small  subset  of  PL/l,  comprising 
variables  of  the  type  FIXED,  POINTER  and  CHARACTER  (fixed  and  varying).   A 
more  significant  difference  is  the  introduction  into  MOL  of  lists  which  can 
be  declared  in  a  LIST  declaration  or  used  as  the  actual  parameters  in  macro- 
call.   All  the  MOL  procedures,  named  macro-definitions,  may  be  called 
recursively.   In  most  other  aspects,  however,  they  are  similar  to  PL/l 
procedures.   The  recursiveness  of  MOL  procedures  is  necessary  to  analyze 
lists  of  any  depth. 

Programs  written  in  MOL  consist  of  a  set  of  external  macro- 
definitions  followed  by  a  sequence  of  external  macro-call.   Both  external 
macro-definitions  and  external  calls  can  be  assembled  independently  of  the 
rest  of  a  program.   In  particular,  the  internal  structure  of  external  calls 
is  exactly  the  same  as  the  structure  of  these  EOL-^-  records,  which  are 
equipped  with  a  key,  that  is  which  have  a  character  string  as  their  first 
constituent.   This  constituent  serves  as  the  key  in  a  record  or  as  a 
macro-call  name.   Records  of  this  type  can  be  interpreted  directly  by  MOL 
as  external  macro-calls.   The  MOL  ability  to  analyze  lists  makes  it  well 
suited  to  generate  a  code  on  the  basis  of  syntactic  units  in  list  form  as 
generated  by  EOL-4. 

The  application  of  the  MOL  language  will  be  presented  by  the 
following  two  examples. 
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2k 


Macro-Definition  SUM 


Remarks 


Let  the  following  external  macro-definition  be  given 

(1)  SUM  :  MACRO  S,  X; 

(2)  /*  S  =  X(l)  +...+  X(N)  */ 

(3)  DECLARE  I  FIXED; 
(k)              +  LOAD   -X(l); 

(5)  DO  I  =  2  TO  DEGREE  (X); 

(6)  +     ADD    -X(l); 

(7)  END; 

(8)  +     STORE   .S; 

(9)  END; 

This  macro-definition  may  be  called  by 

(10)  *  SUM  P, (A,B,C); 
with  the  result: 

(11)  LOAD   A 

(12)  ADD    B 

(13)  ADD    C 
(1*0      STORE   P 


(1)  Macro-definition  heading.   The  macro-name  SUM  has  an 
external  scope. 

(2)  A  comment. 

(3)  Declaration  of  a  variable  I  as  an  integar.   The  variable  I 
has  the  local  scope,  confined  to  the  macro-definition  SUM. 
To  have  external  scope,  the  attribute  EXTERNAL  should  be 
added  to  the  declaration. 

In  MOL,  all  variables  must  be  declared  before  their  usage. 
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(k)      An  instruction  which  begins  with  the  character  "+"  is  a 
print  instruction.   The  string  between  the  beginning 
"+"  and  the  nearest  ";"  represents  the  "picture"  of  in- 
formation which  is  to  be  printed.   The  point  before  an 
identifier  indicates  a  variable;  the  subscript  "l"  of  the 
variable  (in  our  example  "X")  indicates  the  first  con- 
stituent of  the  actual  list  substituted  for  the  formal 
parameter  X.   If  the  actual  value  is  not  a  list,  the 
null- string  is  substituted  in  place  of  -X(l) 

(5-7)   DO  loop  (same  meaning  as  if  it  were  written  in  PL/l) . 

(8)  DEGREE(X)  is  a  built-in  MOL  function  whose  value  equals  the 
degree  (number  of  immediate  constituents)  of  the  list  which 
is  substituted  as  an  actual  parameter  in  the  place  of  X. 

(9)  End  of  the  macro-definition. 

(10)  Macro-call  of  the  macro-definition  SUM.   The  actual  parameters 
of  the  call  are  the  single-character  string  P  and  the  list 
(A,B,C).   The  commas  serving  as  the  delimiters  in  macro-calls 
or  lists  may  be  replaced  by  a  space,  so  that  the  call  (10) 
may  be  written  as 

*   SUM   P  (A  B  C); 

The  asterisk  in  front  of  macro-name  is  equivalent  to  "CALL". 

(ll-l^-)  Result  of  performing  the  macro-call  (10)  of  the  macro- 
definition  (1-9). 

Macro-Definition  EXPR 

Let  the  following  external  macro-definitions  EXPR  and  ASSIGN 
be  given: 

(1)  EXPR  :  MACRO  E; 

(2)  DECLARE  OP  LIST  (+  *), 

(3)  CODE  LIST  (ADD  MUL) ; 
(h)          IF  TYPE  (E)  WE  'LIST'  THEN  DO; 

(5)  +     PUSH   .E; 

(6)  RETURN; 

(7)  END; 


? 
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(8)  *  EXPR        .E(l); 

(9)  *  EXPR       ,B(3); 

(10)  +  .C0DE(ELEM(E(2),0P)); 

(11)  RETURN; 

(12)  END; 

(13)  ASSIGN:MACR0  Y  EQ  X; 
(Ik)  *  EXPR      .X; 

(15)  +  STORE      .Y; 

(16)  END; 

The  macro-definition  EXPR  may  be  called,  for  instance,  by  the 
following  macro- call: 

(17)  *  ASSIGN  Z  =  (P  +  (Q  *  R)); 
with  the  result 

(18)  PUSH  Q 

(19)  PUSH  R 

(20)  MUL 

(21)  LOAD  P 

(22)  ADD 

(23)  STORE  Z 

This  result  may  be  considered  as  a  machine  code  for  a  computer 
with  the  stack  for  performing  arithmetic  operations. 
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Remarks 

(2-3)   Declaration  of  the  list  variable  OP  with  the  initial 

value  (+  *)  and  the  list  variable  CODE  with  the  initial 
value  (ADD  MUL) .   In  any  list-variable  declaration  its 
initial  value  must  be  given. 

(h)     Testing  if  the  type  of  an  actual  value  substituted  for 
the  formal  parameter  E  is  a  list. 

(8)  Call  for  the  procedure  EXPR  with  the  variable  E(l)  as 
the  actual  parameter.   The  point  before  E(l)  indicates 
a  variable;  the  subscript  1  indicates  the  first 
constituent  of  the  value  substituted  for  E. 

Note  that  this  call  is  recursive,  i.e.,  the  macro  EXPR 
is  called  again  before  reaching  the  RETURN  statement. 

(9)  Call  similar  to  the  previous  one,  this  time  only  with  the 
variable  E(3). 

(10)   The  function  ELEM(E(2) ,0P)  has  as  its  value  the  index  of 
value  of  E(2)  within  the  list  OP.   If,  for  instance,  the 
value  of  E(2)  is  "+",  its  index  in  the  list  OP  is  1;  if 
this  value  is  "*",  its  index  is  2.   If  E(2)  is  not  at  all 
found  in  the  list  OP,  its  index  is  zero.   The  value  of 
the  subscripted  variable  C0DE(ELEM(E(2) ,0P))  is  therefore 
equal  to  "ADD"  or  "MUL"  when  the  value  E(2)  is  equal  to 
"+"  or  "*",  respectively. 

(13)  The  second  formal  parameter  "EQ,"  is  never  used  in  the 

macro-definition  body  (lines  1^  and  15)-   Therefore,  in 
macro  call  like  (17)  any  value  for  the  second  actual 
parameter  may  be  used.   This  enables  us  to  use  the  symbol 
"="  in  the  line  17  as  a  comment,  which  can  be  exchanged 
for  any  other  non-empty  string. 

(18-23)   This  result  is  supposed  to  be  a  program  written  in  an 
assembly  language  for  a  computer  with  a  stack. 

From  the  above  discussion  it  is  easy  to  see  that  MOL  can  be  used 

independently  from  EOL  as  a  macro-processor.   In  this  case  MOL  enables  one 

to  build  one's  own  language  with  all  its  statements  in  the  form  of  macro-calls 

and  with  the  meaning  defined  by  the  corresponding  macro-definitions. 


5-   THE  TWT  IMPLEMENTATION 

The  EOL-^  and  MOL  implementation  is  based  on  the  bootstraping 
principle.  As  the  base  of  bootstraping  the  very  simple  macro-processor 
SIMCMP  [10]  is  used,  which  can  be  implemented  with  about  100  FORTRAN  IV 
statements  or  the  equivalent  number  of  instructions  of  assembler  code. 
With  the  aid  of  SIMCMP  a  simple  symbol  manipulation  language  FLUB /TWT 
(First  Language  Under  Bootstraping /TWT)  is  implemented  [11].   FLUB /TWT  is 
the  language  in  which  both  EOL-^  and  MOL  interpreters  are  written.   The 
EOL-^  and  MOL  translators  are  written  in  EOL-^.   The  first  introduction  of 
these  translators  into  a  machine  requires  again  the  use  of  SIMCMP. 

Once  the  system  TWT  is  implemented  and  debugged  on  a  computer, 
the  effort  to  transfer  the  implementation  to  another  computer  is  not  high 
and  can  be  done  in  a  few  weeks  by  a  person  which  is  familiar  both  with  the 
TWT  and  the  object  computer.   The  implementation  obtained  in  this  way  is 
fully  useful  but  slow  in  action.   By  investing  a  few  additional  man-months 
for  the  optimization  of  the  EOL  and  MOL  interpreters  we  can  easily  come 
to  the  point  in  which  programs  written  in  these  languages  run  fast. 
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6.   THE  TWT  APPLICATION 

The  TWT  system  can  be  used  for  teaching  students  many  compiler- 
writing  techniques  and  allows  them  to  write  term  projects  consisting  of 
the  design  of  a  new  language,  the  definition  of  its  semantics,  a  syntax 
analyzer  and  code  generator  for  an  object  computer. 

When  applied  to  the  implementation  of  actual  languages,  TWT  allows 
writing  comparatively  quickly  a  translator  which  runs  slow  but  generates 
fairly  good  object  code;  in  fact,  in  this  respect  TWT  poses  no  special 
limits.   Using  TWT  we  can,  therefore,  deliver  to  the  user  a  translator  in 
a  comparatively  short  time.   Simultaneously,  the  semantic  of  the  language 
can  be  thoroughly  checked  out  and  the  appropriate  changes  to  the  language 
can  be  made.   Once  the  language  is  frozen  we  can  start  to  optimize  the 
translator  to  obtain  a  shorter  translation  time.  As  we  have  experienced  so 
far,  the  translation  speed  depends  mostly  on  two  factors:   on  manipulation 
of  symbol  tables  and  on  a  dozen  or  so  operations  which,  however  simple, 
are  repeated  frequently  during  translation  time.   Therefore,  by  replacing 
a  few  parts  of  the  translater  by  patches  written  in  machine  language  we 
can  achieve  a  considerable  reduction  of  the  translation  time.   Changes  in 
the  MOL  part  usually  are  not  necessary.   The  changes  in  the  E0L-4-  part  are 
not  difficult,  because,  as  a  rule,  we  do  not  need  to  change  the  data 
structure.   The  multiplicity  of  data  forms  in  E0L-4-  is  therefore  a  con- 
siderable advantage  from  the  point  of  view  of  program  optimalization. 

In  the  future  we  intend  to  create  a  standard  subroutine  library, 
written  in  EOL-^,  which  contains  many  techniques  of  syntax  analysis  and 
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other  methods  used  in  the  recognition  part  of  translators.   In  thir; 
language  designer  will  he  able  to  choose  some  of  these  subroutines  best 
suited  to  his  problem,  thus  lowering  the  effort  of  writing  his  translator 
considerably. 
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